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Non-communicable diseases like diabetes, obesity and certain forms of cancer have been increas- 
ing in many countries at alarming levels. A difficulty in the conception of policies to reverse these 
trends is the identification of the drivers behind the global epidemics. Here, we implement a spa- 
tial spreading analysis to investigate whether non-communicable diseases like diabetes, obesity and 
cancer show spatial correlations revealing the effect of collective and global factors acting above 
individual choices. Specifically, we adapt a theoretical framework for critical physical systems dis- 
playing collective behavior to decipher the laws of spatial spreading of diseases. We find a regularity 
in the spatial fluctuations of their prevalence revealed by a pattern of scale-free long-range correla- 
tions. The fluctuations are anomalous, deviating in a fundamental way from the weaker correlations 
found in the underlying population distribution. The resulting scaling exponents allow us to broadly 
classify the indicators into two universality classes, weakly or strongly correlated. This collective be- 
havior indicates that the spreading dynamics of obesity, diabetes and some forms of cancer like lung 
cancer are analogous to a critical point of fluctuations, just as a physical system in a second-order 
phase transition. According to this notion, individual interactions and habits may have negligible 
influence in shaping the global patterns of spreading. Thus, obesity turns out to be a global prob- 
lem where local details are of little importance. Interestingly, we find the same critical fluctuations 
in obesity and diabetes, and in the activities of economic sectors associated with food production 
such as supermarkets, food and beverage stores — which cluster in a different universality class than 
other generic sectors of the economy. These results motivate future interventions to investigate the 
causality of this relation providing guidance for the implementation of preventive health policies. 

PACS numbers: 



The World Health Organization has recognized obesity 
as a global epidemic [1 . Obesity heads the list of non- 
communicable diseases (NCD) like diabetes and cancer, 
for which no prevention strategy has managed to con- 
trol their spreading [2 -7 . Here, Since the gain of ex- 
cessive body weight is related to an increase in calories 
intake and physical inactivity jSHTO] a principal aspect 
of prevention has been directed to individual habits [11 . 
However, the prevalence of NCDs shows strong spatial 
clustering p!2HT4] . Furthermore, obesity spreading has 
shown high susceptibility to social pressure [6 and global 
economic drivers [3]-[5l |7] . This suggests that the spread 
and growth of obesity and other NCDs may be governed 
by collective behavior acting over and above individual 
factors such as genetics and personal choices [4| (5^ . 

To study the emergence of collective dynamics in the 
spatial spreading of obesity and other NCDs, we imple- 
ment a statistical clustering analysis based on critical 
phenomenon physics. We start by investigating regular- 
ities in obesity spreading derived from correlation pat- 
terns of demographic variables. Obesity is determined 
through the Body Mass Index (BMI) obtained via the 
formula weight (kg) /[height (m)]^. The obesity preva- 
lence is defined as the percentage of adults aged > 18 
years with a BMI > 30. We investigate the spatial corre- 
lations of obesity prevalence in the USA during a specific 
year using micro-data defined at the county-level pro- 



vided by the US Centers for Disease Control (CDC) [M] 
through the Behavioral Risk Factor Surveillance System 
(BRFSS) from 2004 to 2008 (see Methods Section |l]). 
The average percentage of obesity in USA was histori- 
cally around 10%. In the early 80s, an obesity transition 
in the hitherto robust percentage, steeply increased the 
obesity prevalence (Fig. [l^). 

The spatial map of obesity prevalence in the USA 
shows that neighboring areas tend to present similar per- 
centages of obese population [14 forming spatial 'obesity 
clusters' [T2l[T3]. The evolution of the spatial map of obe- 
sity from 2004 to 2008 at the county level (Fig. [^) high- 
lights the mechanism of cluster growth. Characterizing 
such geographical spreading presents a challenge to cur- 
rent theoretical physics frameworks of cluster dynamics 
p!5ti2Q] . The equal-time two-point correlation function, 
C(r), determines the properties of such spatial arrange- 
ment by measuring the influence of an observable Xi in 
county i (e.g., in this study: adult population density, 
prevalence of obesity and diabetes, cancer mortality rates 
and economic activity) on another county at distance r 
US]: 

Here, x is the average over all = 3, 141 counties in the 
USA, (Jo = —x^/N is the standard deviation, Vij 
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FIG. 1: The obesity transition, a, CDC 14 provides an 
estimate of the number of obese aduhs, based on self-reported 
weight and height, country- wide since 1970 (blue line), at 
the state level from 1984 to 2009 (red symbols), and at the 
county level from 2004 to 2008 (green symbols). A transition 
is observed around 1980. We base our analysis on the micro- 
data at the county level, b, Map of the spatial spreading of 
obesity prevalence evidencing clustering dynamics, c, Map 
of the population density defined at the county level in 2003 
showing correlated patterns albeit with less clustering than in 
obesity, d, Map of cancer mortality rates per county in 1970 
and 2003 visualizing the transition from high correlations and 
clustering to weak correlation and more uniformity in 2003. 
e, Map of lung cancer mortality per county indicating large 
clustering properties similar to obesity. 



is the euclidean distance between the geometrical cen- 
ter of counties i and j. Large positive values of C{r) 
reveal strong correlations, while negative values imply 
anti-correlations, i.e., two areas with opposed tendencies 
relative to the mean in obesity prevalence (analogous to 
two domains with opposite spins in a ferromagnet [15 ). 

Spatial correlations in any indicator ought to be re- 
ferred to the natural correlations of population fluctua- 



10 10 
L (km) 



FIG. 2: Long-range correlations in spreading phenom- 
ena, a, Correlation function, C(r), averaged over counties 
at distance r for population density from 1969-2009 and obe- 
sity prevalence from 2004-2008. The lines are fittings based 
on OLS regression analysis [22]. The inset shows the corre- 
lation length at C(^) = and highlights the fact that ^ is 
approximately the same for the population, obesity and di- 
abetes prevalence (data for 2004). The inset also highlights 
the anticorrelations for r > ^. b, Population density corre- 
lation function, C(r) vs r, for different countries in 2009 as 
indicated, c, Correlation length ^ vs linear country size L for 
different countries. The symbols indicate the same countries 
as in Fig. ^jp. The remaining star symbols are for other coun- 
tries as indicated in Table [l] L is the square root of the total 
area of the country. 



tions (Fig.jlJ^). To this aim, we first calculate C{r) for the 
population (adults > 18 years) in USA counties, p^, by 
using the density: Xi = pi/ai in Eq. ([T]), where ai is the 
county area. Population density correlations show a slow 
fall-off with distance (Fig. [2|l) described by a power-law 
up to a correlation length ^: 

C{r)^r-\ r<e, (2) 

where 7 is the correlation exponent. Correlations be- 



come short-ranged when 7 > d ((i = 2 is the dimension 
of the map), and stronger as 7 decreases [151 US]- 
Ordinary Least Squares (OLS) regression analysis [22] 
on the population reveals the exponent 7 = 1.01 ±0.08 
(averaged over 1969-2009, Fig. [2^, error bars denote 
95% confidence interval [CI]). The inset of Fig. [2|i re- 
veals a distance where correlations vanish, C{^) = with 
^ = 1050km, representing the average size of the corre- 
lated domains [23]. As we increase r larger than ^, we 
consider correlations between areas in the East and West 
which are anti-correlated since C{r) < for r > ^. 

To determine whether population correlations are 
scale-free, we calculate C{r) for geographical systems of 
different sizes using a high resolution grid of 2.5 arc- 
seconds, available for several countries from [24] (Meth- 
ods Section |IA|. The resulting correlations (Fig. ^p) re- 
veal the same picture as for the USA at the county-level 
(Fig. |2^), i.e., a power-law up to a correlation length. 
We then measure ^ for every country, and investigate 
whether, as expected with the laws of critical phenom- 
ena [inillT], it increases with the country size, L. Indeed, 
we obtain (Fig. [2]3 and Table [l|), 

~ i:^ (3) 

where u = 0.9±0.1 is the correlation length exponent 
[15]. This result implies that the fiuctuations in human 
agglomerations are scale- free, i.e., the only length-scale 
in the system is set by its size and the correlation length 
become infinite when L — ^ 00 [15] |2l] [23] . 

We interpret any departure from 7 = 1 as a proxy of 
anomalous dynamics beyond the simple dynamics related 
to the population growth. When we calculate the spatial 
correlations of obesity prevalence {si = Oi/pi^ Oi is the 
number of obese adults in county i) in USA from 2004 
to 2008 we also find long-range correlations (Fig. 
The crux of the matter is that the correlation exponent 
for obesity (7 = 0.50 ± 0.04, averaged over 2004-2008) is 
smaller than that of the population, signaling anomalous 
growth. Since smaller exponents mean stronger correla- 
tions, the increase in obesity prevalence in a given place 
can eventually spread significantly further than expected 
from the population dynamics. 

We also calculate fiuctuations in variables which are 
known to be strongly related to obesity [8] |9l [12] [25] : 
diabetes and physical inactivity prevalence (fraction of 
adults per county who report no physical activity or ex- 
ercise, see Methods Section [l|. The obtained 7 exponents 
are anomalous with similar values as in obesity (Fig. 
The system size dependence of ^ for obesity and diabetes 
cannot be measured directly, since there is no available 
micro-data for other countries. However, we find that 
^ of obesity and diabetes in USA is very close to ^ of 
the population distribution (inset of Fig. Assuming 
that the equality of the correlation lengths holds also for 
other countries, then obesity and diabetes should satisfy 
Eq. ^ as well. 
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FIG. 3: Correlation exponents, a, Temporal evolution 
of 7 for population distribution, obesity, diabetes, physical 
inactivity, all cancer mortality, and lung cancer mortality per 
county. The diagram displays the classes of strong correla- 
tions, 7st = 0.5, and weak correlations, 7wk = 1- Additionally, 
theory predicts 7rnd > 2 for uncorrelated systems. We did not 
observe any human activity or indicators whose correlations 
fall within this class, unless the data of different counties is 
shuffled, b, Evolution of 7 for different economic indicators 
describing the food industry, the whole economy and generic 
economic sectors as indicated. We quantify economic activity 
by the total number of employees of a given sector per county 
population, c, Correlation functions for the economic activ- 
ities indicated in the figure. The plot shows the segregation 
of the data into two classes. The solid lines indicate 7wk = 1 
and 7st = 1/2. d, Change in C{r) for cancer mortality rates 
in the period 1970-2003. 



The correlations in obesity are reminiscent of those 
in physical systems at a critical point of second-order 
phase transitions [151 EH ES]. Physical systems away 
from criticality are uncorrelated and fluctuations in ob- 
servables, e.g., magnetization in a ferromagnet or den- 
sity in a fluid, decay faster than a power-law, e.g., ex- 
ponentially [l5l[2T]. Instead, long-range correlations ap- 
pear at critical points of phase transitions where fluctu- 
ations are not independent and, as a consequence, fall- 
off more slowly. The existence of long-range correlations 
with 7 = 0.5 — rather than the noncritical exponential 
decay — signals the emergence of strong critical fluctu- 
ations in obesity and diabetes spreading. The notion 
of criticality, initially developed for equilibrium systems 
[m [2TI , has been successfully extended to explain a wide 
variety of dynamics away from equilibrium ranging from 
collective behaviour of bird cohorts, biological and social 
systems to city growth, just to name a few [2T | l23 l l26 l [27] . 
Its most important consequence is that it characterizes a 
system for which local details of interactions have a negli- 
gible influence in the global dynamics [151 EI] • Following 
this framework, the clustering patterns of obesity are in- 
terpreted as the result of collective behaviors which are 
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not merely the consequence of fluctuations of individual 
habits. 

As a tentative way of addressing which elements of the 
economy may be related to the obesity spread, we cal- 
culate 7 in economic indicators which are thought to be 
involved in the rise in obesity [H [5]. Except for tran- 
sient phenomena, all studied indicators yield exponents 
that fall around 7wk = 1 or 7st = 1/2, representing two 
universality classes of weak and strong correlations, re- 
spectively (Figs. ^ and b). 

We begin by studying the correlations in the whole 
economy (measured through the number of employees of 
all economic sectors per county population, see Meth- 
ods Section [l|). We flnd 7 close to 7wk = 1 (over the 
period 1998-2009, Fig. [sJd and c) suggesting that the 
whole economy inherits the correlations in the popula- 
tion. Generic sectors of the economy which are not be- 
lieved to be drivers of obesity, e.g., wholesalers, admin- 
istration, and manufacturing, also display 7 consistent 
with the population trend (Fig. [sJd and c). 

Interestingly, analysis of the spatial fluctuations in the 
economic activity of sectors associated to food produc- 
tion and sales (supermarkets, food and beverages stores 
and food services such as restaurants and bars) gives 
rise to the same anomalous value as obesity and diabetes 
(7st = 1/2, 1998-2009, Fig. [3]d and c). Although these re- 
sults cannot inform about the causality of these relations, 
they show that the scaling properties of the obesity pat- 
terns display a spatial coupling which is also expressed 
by the fluctuations of sectors of the economy related to 
food production. 

It is of interest to study other health indicators for 
which active health policies have been devoted to control 
the rate of growth. We apply the correlation analysis 
to lung cancer mortality deflned at the county level and 
compare with cancer mortality due to all types (Methods 
Section [l|. The spatial correlations of cancer mortality 
per county show an interesting transition in the late 70 's 
from anomalous strong correlations, 7st = 1/2, to weak 
correlations, 7wk = 1, (Fig. [S^i and d). This transition 
is visualized in the different correlated patterns of cancer 
mortality in 1970 and 2003 in Fig. [T]i, i.e., the clustering 
of the data is more profound in 1970, while in 2003 it 
spreads more uniformly. The current status of all-cancer 
mortality fluctuations is close to the natural one, inflicted 
by population correlation. Conversely, fluctuations in the 
mortality rate due to lung cancer from 1970 to 2003 have 
remained highly correlated and close to the obesity value, 
7st = 1/2 (Fig. ^ and [1^), while the other types of 
cancer have become less correlated. This is an interesting 
flnding since, similarly to obesity, lung cancer prevalence 
is affected by a global factor (smoking) and has been 
growing rapidly during the studied period. 

The most visible characteristic of correlations is the 
formation of spatial clusters of obesity prevalence. To 
quantitatively determine the geographical formation of 
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FIG. 4: Percolation picture of obesity, a, Size of the 
largest component (circles) and second largest component 
(squares) as a function of the obesity prevalence threshold 
s in 2008. As we lower s, the size of the largest component 
increases by absorbing smaller clusters. In many cases, this is 
done abruptly, indicating that whole clusters have been incor- 
porated to the largest cluster as evidenced by the peaks in the 
second largest cluster and the jumps in the size of the largest 
clusters [16] . We observe two main transitions at Sci and 
in the real data (red) and a single percolation second-order 
transition in the randomized data (blue) . The maps show the 
progression of the obesity clusters with at least 5 counties for 
a given s. b, Percolation tree representing the hierarchical 
formation, growth and merging of obesity clusters. Each dot 
represents a cluster at a given s with a size proportional to 
the logarithm of the cluster's area. Cluster colors follow Fig. 
|4|l and we separate them by the indicated geographic regions. 
As we lower s from right to left, regions of high obesity preva- 
lence appear first in the tree. We notice the main percolating 
cluster starting in the lower Mississippi basin (red) at high s 
and absorbing the other clusters until percolating through all 
US. In particular, we note the two main transitions at Sci, 
where it absorbs the two Appalachian clusters, and at Sc2, 
where it absorbs the West US cluster. 



obesity clusters, we implement a percolation analysis [161 - 
1201 128] . The control parameter of the analysis is the obe- 
sity threshold, s. An obesity cluster is a maximally con- 
nected set of counties for which Si exceeds a given thresh- 
old s: Si > s. By decreasing 5, we monitor the progressive 
formation, growth and merging of obesity clusters. 
In random uncorrelated percolation [16^ , small clusters 
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FIG. 4: c, Detail of the evolution of obesity clusters near 
percolation as indicated. The map shows the shape of the 
first (red), second (yellow), and third (violet) clusters around 
, and the largest (green) cluster at , together with the 
location of the red bonds responsible for the transitions. The 
epicenter is Greene county, AL with 43.7% obesity prevalence, 
d, Box fractal dimension of percolating cluster in the inset 
measured by the number of boxes of size e needed to cover the 
cluster: A^s(e) e~^^ , and fractal dimension of the boundary 
measured by the number of boxes needed to cover the hull: 
Nh{e) ^ e""^^. e, Probability distribution of the area of the 
obesity clusters, P{A) ~ , at percolation Sci average from 
2004-2008. This scaling law generalizes Zipf's law ^26j from 
urban to obesity clusters. 



would be formed in a spatially uniform way until a crit- 
ical value, 5c, is reached, and an incipient cluster spans 
the entire system. Instead, when we analyze the obesity 
clusters we observe a more complex pattern exemplified 
in Fig. [4^ and|4]3 for year 2008. At large 5, the first clus- 
ter appears in the lower Mississippi basin (red in Fig. [4^) 
with epicenter in Greene county, AL. Upon decreasing s 
to 0.32, new clusters are born including two spanning the 
South and North of the Appalachian Mountains, which 
acts as a geographical barrier separating the second and 
third largest clusters (yellow and violet in Fig. [4|i, re- 
spectively). Further lowering 5, we observe a percolation 
transition in which the Appalachian clusters merge with 
the Mississippi cluster. This point is revealed by a jump 
in the size of the largest component and a peak in the 



second largest component at 5^ = 0.314 (Fig. [4^) as 
features of a percolation transition [16 . At 5^, three 
"red bonds" (McLean county, KY, DeKalb county, TN, 
and Colquitt county, GA) appear to connect the incipi- 
ent largest cluster spanning the East of USA (see Fig. ^ 
and Methods Section |llj). As a comparison, when we ran- 
domize the obesity data by shuffling the values between 
counties, a single Sc = 0.29 appears as a signature of 
a uncorrelated percolation process (blue symbols in Fig. 

Obesity clusters in the West persist segregated from 
the main Eastern cluster avoiding a full-country percola- 
tion due to low-prevalence areas around Colorado state. 
Finally, the East and West clusters merge at Sc^ = 0.256 
by a red bond (Rich county, Utah) producing a sec- 
ond percolation transition; this time spanning the whole 
country (see Fig. [4^ and c, where the whole spanning 
cluster is green, and Methods Section [ll|. This cluster- 
merging process is a hierarchical percolation progression 
represented in the tree model in Fig. 

To further inquire whether the spreading of obesity has 
the features of a physical system at the critical point, 
we examine the geometry and distribution of obesity 
clusters. For long-range correlated critical systems per- 
colating through nearest neighbors in two dimensional 
maps, the geometrical structure [16l [191 ISO] gives rise 
to three critical exponents: the fractal dimension of the 
spanning cluster, df^ the fractal dimension of the hull, 
de, and the cluster size distribution exponent, r, anal- 
ogous to Zipf's law [26] (Methods Section III). For the 
percolating obesity cluster at 5^ displayed in the inset 
of Fig. |4]i, we confirm critical scaling with exponents: 
{df, de.r) = (1.79 ± 0.08, 1.37 ± 0.06, 1.9 ± 0.1) (Fig. [4]i, 
e). 

Taken together, these results show that obesity spread- 
ing behaves as a self-similar strongly-correlated critical 
system U5l^]. In particular, a note of caution has to be 
raised since, even if the highest prevalence of obesity is 
localized to the South and Appalachia, the scaling anal- 
ysis indicates that the obesity problem is the same (self- 
similar) across all USA, including the lower prevalence 
areas. 

Our results cannot establish a causal relation between 
obesity prevalence and economic indicators: whether 
fiuct nations in the food economy may impact obesity or, 
instead, whether the food industry reacts to obesity de- 
mands. However, the comparative similarities of statis- 
tical properties of demographic and economical variables 
serves to identify possible candidates which shape the 
epidemic. Specifically, the observation of a common uni- 
versality class in the fiuctuations of obesity prevalence 
and economic activity of supermarkets, food stores and 
food services — which cluster in a different universality 
class than simple population dynamics — is in line with 
studies proposing that an important component of the 
rise of obesity is linked to the obesogenic environment 
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[31 129] regulated by food market economies [H |5l [9] . This 
result is consistent with recent research that relates obe- 
sity with residential proximity to fast-food stores and 
restaurants [71 130] . The present analysis based on clus- 
tering and critical fluctuations is a supplement to studies 
of association between people's BMI and food's environ- 



ment based on covariance 0^ (Methods Section IV). 
We have detected potential candidates in the economy 
which relate to the spreading of obesity by showing the 
same universal fluctuation properties. These tentative 
relations ought to be corroborated by future intervention 
studies. 
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Methods 



I. DATASETS 

Obesity is determined through the Body Mass Index 
(BMI) which compares the weight and height of an in- 
dividual via the formula weight (kg) /height (m^). A BMI 
value of 30 is considered the obesity threshold. Over- 
weight but not obese is 25 <BMI< 30, and underweight 
is BMI<18.5. Our main measure in this work is the adult 
obesity prevalence of a county, Si = o^/pi, for a given year 
defined as the number of obese adults Oi (BMI> 30) in a 
county i over the total number of adults in this county. 
Pi. We use the data from the USA Center for Disease 
Control (CDC) downloaded from [14 . CDC provides an 
estimate of the obesity country-wide since 1970, at the 
state level from 1984 to 2009, and at the county level 
from 2004 to 2008. The study of the correlation function 
C{r) requires high resolution data. Therefore, we use 
data defined at the county level and restrict our study of 
obesity and diabetes to the available period 2004-2008. 
Other indicators are provided by different agencies at the 
county level for longer periods. 

The datasets analyzed in this paper were obtained from 
the websites as indicated below. They can be downloaded 
as a single tar datafile from jamlab.org, The datasets 
consist of a list of populations and other indicators at 
specific counties in the USA at a given year. A graphical 
representation of the obesity data can be seen in Fig. ^ 
for USA from 2004 to 2008, where each point in the maps 
represents a data point of obesity prevalence directly ex- 
tracted from the dataset. 

The datasets that we use in our study have been col- 
lected from the following sources: 

(a) Population.— US Census Bureau. We download 
a number of datasets at the county level from |http : //| 
[www . census . gov/ support/USACdataDownloads . html 

- For the population estimates we use the table 
PIN030. For the years 1969-2000 we use data supplied 
by BEA (Bureau of Economic Analysis) and for years 
2000-2009 we use the file CO-EST2009-ALLDATA.csv 
from http://www.census.gov/popest/counties/files/CO- 
EST2009-ALLDATA.CSV 

(b) Health indicators.— Data downloaded from the 
Centers for Disease Control and Prevention (CDC). 

http : //apps . need . cdc . go v/DDT_STRS 2/ 
[NationalDiabetesPrevaleneeEstimates . aspx' 

The center provides county estimates between the 
years 2004-2008 for: 

- Diagnosed diabetes in adults. 

- Obesity prevalence in adults. 

- Physical inactivity in adults. 

The estimates for obesity and diabetes prevalence and 
leisure-time physical inactivity were derived by the CDC 



using data from the census and the Behavioral Risk Fac- 
tor Surveillance System (BRFSS) for 2004, 2005, 2006, 
2007 and 2008. BRFSS is an ongoing, state-based, 
random-digit-dialed telephone survey of the U.S. civil- 
ian, non-institutionalized population aged 18 years and 
older. The analysis provided by the BRFSS is based 
on self-reported data, and estimates are age-adjusted on 
the basis of the 2000 US standard population. Full in- 
formation about the methodology can be obtained at 
http:/ /www. cdc.gov/diabetes/statistics. 

(c) Economic indicators.— We down- 
load data for economic activity through 
http://www.census.gov/econ/. The economic ac- 
tivity of each sector is measured as the total number 
of employees in this sector per county in a given year 
normalized by the population of the county. The North 
American Industry Classification System (NAICS) 
(http: / / www.census.gov / epcd / naics02 / naicod02.htm) 
assigns hierarchically a number based on the particular 
economy sector. The NAICS is the standard used by US 
statistical agencies in classifying business establishments 
across the US business economy. 

In this study we have used the following economic sec- 
tors with their corresponding NAICS: 

• Whole economy. Entire output of all economic sec- 
tors combined including all NAICS codes. 

• 31. Manufacturing. Broad economic sector from 
textiles, to construction materials, iron, machines, 
etc. 

• 42. Wholesale trade. Very broad sector including 
merchants wholesalers, motors, furniture, durable 
goods, etc. 

• 56. Administrative jobs and support services. 

• 445. Food and beverage stores. Including all the 
food sectors, from supermarkets, fish, vegetables 
meat markets, to restaurants and bars and other 
services to the food industry. 

• 44511. Supermarkets and other grocery (except 
convenience) stores. This is a subsection of NAICS 
445. 

• 722. Food services and drinking places. A sub- 
sector of NAICS 72 which includes restaurants, 
cafeterias, snacks and nonalcoholic beverage bars, 
caterers, bars and drinking places (alcoholic bever- 
ages) . 

(d) Mortality rates.— We use data from the 
National Cancer Institute SEER, Surveillance 
Epidemiology and End Results downloaded from 
http: / / seer.cancer.gov/data/ 
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The Institute provides mortality data from 1970 to 
2003, aggregated every three years. We analyze the mor- 
tality of a specific form of cancer per county normalized 
by the population of the county. Here, we use mortality 
data for the following causes of death: 

- All cancer, independently of type. 

- Lung cancer. 



A. Gridded data of population from CIESIN 

We take advantage of the available data of population 
distribution around the globe defined in a square grid of 
2.5 arc-seconds obtained from [24 . This data allows to 
study the correlation functions of the population distri- 
bution for many countries. By using this data we are able 
to test the system size dependence of our results. We find 
that the correlation length ^ is proportional to the linear 
size of the country, L. The linear size is calculated as To- 
tal Area = . We find that the correlation scales with 
the system size as discussed in the text. For instance, for 
the USA population distribution we find ^ = 1050km, 
while a smaller country like UK has ^ = 321 km. 

Table |l] shows a list of countries used in Figs, and 
c to determine the correlation length ^ of the correlation 
function of population density. 



II. EVOLUTION OF OBESITY CLUSTERS 
NEAR PERCOLATION 

The shape of the main obesity clusters and location 
of the red bonds and obesity epicenter are depicted in 
Fig. |4j3 overlayed with a US map showing the bound- 
aries of states and counties. Figure [4J3 shows the obesity 
clusters obtained at 5 = 0.318, 5^ = 0.314, s = 0.310, 
and = 0.256, depicting the process of percolation. At 
s = 0.318, we plot the largest red cluster which is seen in 
the lower Mississippi basin. The highest obesity preva- 
lence is in Greene county, AL, which acts as the epicen- 
ter of the epidemic. At 5^ , we plot in yellow the second 
largest cluster in the Atlantic region south of the Ap- 
palachian Mountains, and at s = 0.310 we plot the third 
largest cluster (violet), which appears north of the Ap- 
palachian Mountains. We mark with black the three red 
bonds that make the Mississippi cluster to grow abruptly 
by absorbing the clusters in the Appalachian range. The 
red bonds are DeKalb county, TN, McLean county, KY, 
and Colquitt county, OA. This transition is refiected in 
the jump in the size of the largest cluster in Fig. |4^. The 
same process is observed in the second percolation transi- 
tion at , when the red bond. Rich county, UT, joins the 
Eastern and Western clusters for a whole-country perco- 
lation. 



III. SCALING EXPONENTS OF PERCOLATION 
CLUSTERS 

The scaling properties characterizing the geometry and 
distribution of clusters at percolation are fi6\ : 

(i) The scaling of the number of boxes Nb to cover the 
infinite spanning cluster versus the size of the boxes e: 

NB{e)^e-^f, (4) 

defining the fractal dimension of the spanning cluster, df. 

(a) The number of boxes, N^^ of size e covering the 
perimeter of the infinite cluster: 

Nh{e)^e-''% (5) 

defining the hull fractal dimension, dg. 

(in) The probability distribution of the area of clusters 
at percolation: 

P{A) ~ A-\ (6) 

characterized by the critical exponent r. Additionally, 
there is a scaling relation between the fractal dimension 
and the cluster distribution exponent [16 : r = 1^2/df. 

The exponents {df^de^r) for percolation with long- 
range correlations have been calculated numerically in 
[m [20] as a function of the correlation exponent 7 using 
standard percolation analysis. There exist also a the- 
oretical prediction based on Renormalization Group in 
[18] for the correlation length exponent. The values of 
(df^de^r) for the obesity clusters at the first percolation 
transition, s^, are reported in the main text. A direct 
computer simulations of long-range percolation [19i i20j 
for 7 = 0.5 finds the values of the three geometric expo- 
nents to be {df, de,r) = (1.9±0.1, 1.39±0.03, 2.05±0.08), 
consistent with those reported here. 

We notice that the exponent r is expected to be larger 
than 2. This is due to mass conversation, assuming that 
the power-law Eq. (|6| extends to infinity at percolation 
in a infinite system size. The fact that we find a value 
slightly smaller than 2 for the obesity clusters, might be 
due to a finite size effect. We also notice that the values 
of the exponents obtained from correlated percolation at 
7 = 0.5 are not too far from those of uncorrelated perco- 
lation [12]. Therefore, the values of the exponents may 
not be enough to precisely compare the obesity clusters 
with long-range percolation clusters. However, they serve 
as an indication that the obesity clusters have the geo- 
metrical properties of clusters at a critical point, such 
as scaling behavior. Furthermore, it could be possible 
that long-range correlated percolation may capture only 
part of the dynamics of the clustering epidemic. It could 
be, for instance, that higher order correlations, beyond 
the two-point correlation captured by C(r), are also rel- 
evant in determining the value of the exponents. In this 
case, our analysis should be supplemented by studies of 
n— point correlation functions, beyond C{r). 
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TABLE I: List of countries used to calculate the correlation 
length, <J, from the correlation function of the population den- 
sity. Data is obtained from 24 . L is calculated as the square 
root of the total area of the country. 



Country name 


L (km) 


e(km) 


Liechtenstein 


12.65 


11.5 


Malta 


17.75 


10 


Andorra 


21.54 


8.2 


Bahrain 


24.96 


14.5 


Hong-Kong 


32.46 


13.7 


Luxembourg 


50.86 


26.24 


Cyprus 


96.29 


14.5 


Kuwait 


131.37 


27.5 


Slovenia 


142.21 


26 


El Salvador 


142.40 


33.5 


Burundi 


158.83 


71.5 


Albania 


168.36 


67.8 


Belgium 


174.79 


98.3 


Switzerland 


197.42 


85 


Netherlands 


203.38 


84 


Dominican Republic 


219.29 


67.7 


Lithuania 


254.94 


26.5 


Ireland 


263.58 


50.5 


Czech Republic 


280.38 


35.2 


Hungary 


303.39 


46.5 


Bulgaria 


333.63 


38.5 


Uruguay 


417.11 


114 


United Kingdom 


497.18 


321 


Oman 


551.53 


290 


Poland 


557.85 


202 



IV. COVARIANCE 

Our approach supplements covariance analysis [71 [30] . 
Instead, we use physics concepts to shed a different view 
on the spreading of epidemics. Our approach can be ex- 
tended to the study of the geographical spreading of any 
epidemic: from diabetes and lung cancer, to the spread- 
ing of viruses or real states bubbles, where the spatial 
spreading plays an important role. 

Population correlations are naturally inherited by all 
demographic observables. Even variables whose inci- 
dence varies randomly from county to county would ex- 
hibit spatial correlations in their absolute values, simply 



because its number increases in more populated counties 
and population locations are correlated. Indeed, the ab- 
solute number of obese adults per county is directly pro- 
portional to the population of the county [Bettencourt, 
L. M. A., et al. Growth, innovation, scaling, and the 
pace of life in cities. Proc. Natl. Acad. Sci. USA 104, 
7301-7306 (2007)]. Our aim is to measure spatial fluc- 

TABLE II: Continuation 



Country name 


L (km) 


e (km) 


Congo 


585.86 


440 


Germany 


596.68 


123 


Japan 


609.68 


157 


Zimbabwe 


623.74 


81 


Paraguay 


629.19 


330 


Iraq 


656.18 


142 


France 


739.68 


83 


Kenya 


761.33 


345 


Ukraine 


767.08 


82 


Madagascar 


770.04 


227 


Zambia 


863.317 


190 


Pakistan 


886.183 


438 


Nigeria 


950.913 


293 


Venezuela 


954.756 


565 


Bolivia 


1034.09 


350 


Ethiopia 


1060.05 


370 


South Africa 


1103.47 


588 


Iran 


1261.09 


618 


Saudi Arabia 


1392.42 


618 


Mexico 


1393.92 


622 


Congo Democratic Republic 


1520.99 


420 


India 


1791.57 


495 


Australia 


2763.09 


565 


Brazil 


2912.11 


660 


United States of America 


3034.92 


1050 



tuations on the frequency of incidence, independent of 
population agglomeration. Thus, spatial correlations of 
all indicators ought to be calculated on the density de- 
fined, in the case of obesity, as Si = Oi/pi rather than on 
the absolute number of obese people, o^, itself. The spa- 
tial correlations of the fluctuations of Si from the global 
average captures the collective behavior expressed in the 
power-law described in Eq. Q. 



